Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StreamFlatMapOptional error-prone check #2946

Merged
merged 4 commits into from
Nov 9, 2024

Conversation

schlosna
Copy link
Contributor

@schlosna schlosna commented Nov 8, 2024

Before this PR

JFR profiles indicated some uses of Stream<Optional<T>>.flatMap(Optional::stream) were causing unnecessary allocations & GC pressure.

After this PR

==COMMIT_MSG==
If one has a Stream<Optional<T>> stream of size N and does stream.flatMap(Optional::stream), you’ll end up allocating N extra streams — one for each Optional input element. When N is large, those allocations can cause extra GC cycles and pauses if allocation rate is high enough leading to issues with latency, throughput, and allocation sensitive code paths.

Stream.filter(Optional::isPresent).map(Optional::get) is more efficient than Stream.flatMap(Optional::stream) as it does not allocate a new Stream for every element in the stream.
==COMMIT_MSG==

Possible downsides?

If one has a `Stream<Optional<T>> stream` of size `N` and does
`stream.flatMap(Optional::stream)`, you’ll end up allocating `N`
extra streams — one for each `Optional` input element. When `N` is
large or on latency, throughput, and allocation sensitive code paths
those allocations can cause extra GC cycles or pauses if allocation
rate is high enough.

`Stream.filter(Optional::isPresent).map(Optional::get)` is more
efficient than `Stream.flatMap(Optional::stream)` as it does not
allocate a new `Stream` for every element in the stream.
@schlosna schlosna requested a review from carterkozak November 8, 2024 20:54
@changelog-app
Copy link

changelog-app bot commented Nov 8, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

If one has a Stream<Optional<T>> stream of size N and does stream.flatMap(Optional::stream), you’ll end up allocating N extra streams — one for each Optional input element. When N is large, those allocations can cause extra GC cycles and pauses if allocation rate is high enough leading to issues with latency, throughput, and allocation sensitive code paths.

Stream.filter(Optional::isPresent).map(Optional::get) is more efficient than Stream.flatMap(Optional::stream) as it does not allocate a new Stream for every element in the stream.

Check the box to generate changelog(s)

  • Generate changelog entry

@schlosna schlosna changed the title Add StreamFlatMapOptional Add StreamFlatMapOptional error-prone check Nov 8, 2024
Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic, thanks for putting it together!

I added a few comments which aren't blockers, goal is to give you as much context as possible since you mentioned that you'd like to write more of this style of check :-)

Comment on lines 63 to 66
String replacement = state.getSourceForNode(ASTHelpers.getReceiver(tree.getMethodSelect()))
+ ".filter(Optional::isPresent).map(Optional::get)";
SuggestedFix fix = SuggestedFix.builder()
.addImport("java.util.Optional")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be a bit safer to use qualifyType for these, in case something like the old guava Optional type is also imported in the same class.

e.g.

qualifiedType = SuggestedFixes.qualifyType(state, fix, "com.google.common.collect.Lists");

This replaces addImport and potentially uses a fully qualified reference if another type with the same simple name is already imported.

+ ".filter(Optional::isPresent).map(Optional::get)";
SuggestedFix fix = SuggestedFix.builder()
.addImport("java.util.Optional")
.replace(tree, replacement)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there are multiple Stream.flatMap(Optional::stream) occurrences in a single call chain, it may be helpful to make a more specific replacement, allowing a single round of applying the fixes to produce a passing build.

Instead of replacing the full tree, we can do something along these lines:

.replace(
    state.getEndPosition(ASTHelpers.getReceiver(tree.getMethodSelect())),
    state.getEndPosition(tree),
    ".filter(Optional::isPresent).map(Optional::get)")

I imagine a test could produce this case using something along these lines:

Stream<String> f(Stream<Optional<Optional<String>>> in) {
    return in.flatMap(Optional::stream).flatMap(Optional::stream);
}

Comment on lines 57 to 62
if (STREAM_FLAT_MAP.matches(tree, state)) {
ExpressionTree stream = ASTHelpers.getReceiver(tree);
if (stream != null) {
List<? extends ExpressionTree> arguments = tree.getArguments();
if (arguments != null && arguments.size() == 1) {
if (OPTIONAL_STREAM.matches(arguments.get(0), state)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error-prone provides a utility function which may be helpful here. Sometimes I opt not to use it in order to get access to parameters required to build the suggested fix, but I figured I'd call it out in case you find it helpful:

private static final Matcher<ExpressionTree> 
STREAM_FLATMAP_OPTIONAL_STREAM = Matchers.methodInvocation(
    STREAM_FLAT_MAP,
    // Any of the three MatchTypes are reasonable in this case, given a single arg
    MatchType.LAST,
    OPTIONAL_STREAM);

@schlosna
Copy link
Contributor Author

schlosna commented Nov 9, 2024

Thanks @carterkozak , that looks much better. Added a few more test cases as well.

Copy link
Contributor

@carterkozak carterkozak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bulldozer-bot bulldozer-bot bot merged commit 09d4249 into develop Nov 9, 2024
5 checks passed
@bulldozer-bot bulldozer-bot bot deleted the ds/optional-stream branch November 9, 2024 01:52
@autorelease3
Copy link

autorelease3 bot commented Nov 9, 2024

Released 6.1.0

@pkoenig10
Copy link
Member

Hey I don't think this is a good as a blanket rule. It makes the code harder to read, less resilient to refactors, and encourages potentially error-prone patterns.

When devs see things like .map(Optional::get) they are encouraged to do that elsewhere and can easily forget the present check. The vast majority of code that does .flatMap(Optional::stream) is not performance sensitive enough for this to matter. While I agree this is useful to avoid allocations, I don't think the cost is unambiguously worth the benefit.

For example, this check changed roughly 65 instances of this in our internal authentication service in https://g.p.b/foundry/multipass/pull/21715.

@carterkozak @schlosna can we consider reverting this?

@schlosna
Copy link
Contributor Author

@pkoenig10 are there any places you have found that are incorrect? I just looked through that PR and all of the replacements look correct to me, and given the service this seems like a net improvement to avoid unnecessary GC overhead.

Optional is effectively boxing nullable references, which I know we like express in APIs, but streaming through these is an internal implementation detail that we should make efficient by default. I think you would agree that avoiding boxed types in performance sensitive places is beneficial given heavy use in MP. Most uses that this check fixes are on hot paths that are at least O(N) if not O(N * M) or worse, and are places we should not be eating additional allocation & CPU cycles on Optional. I can share a few internal JFR showing some services with 3-5% of total allocations due to Stream.flatMap(Optional::stream). I want to avoid future regressions, thus this automated fix.

I'm not convinced folks will incorrectly use Optional#get() more because of this check. IntelliJ will appropriately flag uses of Optional#get() that are not guarded by Optional#isPresent() or Optional#isEmpty(). Ideally folks would use those in combination with Optional#orElseThrow() and we would have error-prone checks enforcing at compile time, or in less performance sensitive places Optional#map().

See also "Effective Java" 3rd edition, Item 55: Return optionals judiciously and Stuart Marks' https://stuartmarks.wordpress.com/wp-content/uploads/2017/10/optionalmotherofallbikesheds-devoxxus2017.pdf for external bike shed opinions.

@pkoenig10
Copy link
Member

@pkoenig10 are there any places you have found that are incorrect?

I'm not suggesting that the automated refactors are incorrect - I agree that they are correct.

I can see the arguments both ways here. I certainly agree we should be doing this in performance sensitive places. But these allocations are inconsequential for 99% of the code we write.

I'm not convinced folks will incorrectly use Optional#get() more because of this check. IntelliJ will appropriately flag uses of Optional#get() that are not guarded by Optional#isPresent() or Optional#isEmpty().

I see devs makes these kinds of mistakes all the time - pattern-matching from existing code without thinking more deeply about what they are actually writing. IntelliJ lints and code reviews are tools to mitigate this, but they aren't foolproof in the way that compile-time checks (ie. typechecking) or tests are.

I won't belabor the point. I understand the arguments here but just don't personally agree that it is desirable to declare that all uses of .flatMap(Optional::stream) should be replaced by .filter(Optional::isPresent).map(Optional:get).

@carterkozak
Copy link
Contributor

I think the key piece is that sometimes this optimization is incredibly meaningful, but sometimes code doesn’t need to be optimal.

so, do we force folks to make this distinction based on their knowledge of the scope they’re working within, or do we eliminate the option so that folks don’t have to think about it. If we give them the option, are we confident they’ll get it right most of the time? Likely not, since copy/paste is common and tends to occur without much thought about the perf requirements of the original vs new code (and often times code which wasn’t meant to be hot becomes hot in a refactor or new call path).

I think the additional verbosity is a reasonable trade off, and likely to lead to fewer problems than the alternative, but it’s not a hill that I’d die on!

@pkoenig10
Copy link
Member

Makes sense, I don't disagree with reasons here, thanks for engaging!

@schlosna
Copy link
Contributor Author

#2946 is a draft PR to catch cases where one attempts to Stream#map(Optional::get) without Stream#filter(Optional::isPresent).

if (receiver != null) {
SuggestedFix.Builder fix = SuggestedFix.builder();
String optionalType = SuggestedFixes.qualifyType(state, fix, Optional.class.getCanonicalName());
String replacement = ".filter(" + optionalType + "::isPresent).map(" + optionalType + "::get)";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use orElseThrow instead of get? The Java docs state that orElseThrow should be preferred and I've come to agree with this recommendation.

https://docs.oracle.com/javase/10/docs/api/java/util/Optional.html#get()

Screenshot 2024-11-11 at 10 39 39 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call #2950

@ash211
Copy link
Contributor

ash211 commented Nov 15, 2024

Current JDKs make us choose between what's "readable":

Stream<Optional<MyObject>> mystream;

mystream
  .flatMap(Optional::stream)

and what's efficient:

Stream<Optional<MyObject>> mystream;

mystream
  .filter(Optional::isPresent)
  .map(Optional::get)

Is there a possible API addition to the JDK that would be both readable and efficient?

@wi11dey
Copy link

wi11dey commented Nov 22, 2024

noting that for projects that use StreamEx, .mapPartial(Function.identity()) should have the same perf characteristics of .filter(Optional::isPresent).map(Optional::get) based on the implementation https://github.com/amaembo/streamex/blob/master/src/main/java/one/util/streamex/AbstractStreamEx.java#L768-L771, and is arguably as readable/idiomatic as .flatMap(Optional::stream)

@schlosna
Copy link
Contributor Author

schlosna commented Jan 15, 2025

Current JDKs make us choose between what's "readable":

Stream<Optional<MyObject>> mystream;

mystream
  .flatMap(Optional::stream)

and what's efficient:

Stream<Optional<MyObject>> mystream;

mystream
  .filter(Optional::isPresent)
  .map(Optional::get)

Is there a possible API addition to the JDK that would be both readable and efficient?

Using Stream .mapMulti(Optional::ifPresent) would be the readable & efficient JDK 16+ version. See https://github.com/Nikolas-Charalambidis/java-16-mapmulti-benchmark (h/t @westinrm ).

I have not had cycles to try automating the refactor from .flatMap(Optional::stream) to .<T>mapMulti(Optional::ifPresent) as the generic type inference is a bit beyond my error-prone kung-fu and limited hacking time.

@pkoenig10
Copy link
Member

Given #2898, I think it's safe to make that change.

We probably would also want to replace .filter(Optional::isPresent).map(Optional::get) with this.

I have not had cycles to try automating the refactor from .flatMap(Optional::stream) to .mapMulti(Optional::ifPresent) as the generic type inference is a bit beyond my error-prone kung-fu and limited hacking time.

One option is to cheat a bit and perform a regex match on the source code.

@schlosna
Copy link
Contributor Author

Started a draft in #2996 but need to tweak the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants